Bayesian Bandits, Secretaries, and Vanishing Computational Regret

نویسندگان

  • Ashish Goel
  • Sudipto Guha
  • Kamesh Munagala
چکیده

We consider the finite-horizon multi-armed bandit problem under the standard stochastic assumption of independent priors over the reward distributions of the arms. We define a new notion of computational regret against the Bayesian optimum solution instead of worst-case against the true underlying distributions. We show that when the priors of the arms satisfy a log-concavity condition, there is a simple index-type policy that achieves per-step computational regret O ( log log T log T ) regardless of how the number n of arms relates to the time horizon T . This shows that the regret vanishes as T → ∞. As a corollary, we present an additive PTAS for this problem. This complements existing literature on worst-case regret bounds that only hold for the case when n is much smaller than T . The log-concavity condition is widely used and is satisfied by Beta, Gaussian, and Uniform priors, which are the most common in these settings. Our policy is far simpler to implement than the well-known Gittins index policy, which is also not optimal for the finite horizon case. We also give evidence that the log-concavity condition is necessary for the type of regret bounds we show. Finally, we show that our results also extend to the related “budgeted learning” and the secretary problems. ∗Departments of Management Science and Engineering and (by courtesy) Computer Science, Stanford University. Email: [email protected]. Research supported by an NSF ITR grant, the Stanford-KAUST alliance for academic excellence, and gifts from Google, Microsoft, and Cisco. †Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia PA 19104-6389. Email: [email protected]. Research supported in part by an Alfred P. Sloan Research Fellowship, an NSF CAREER Award, and NSF Award CCF-0644119. ‡Department of Computer Science, Duke University, Durham NC 27708-0129. Research supported by an Alfred P. Sloan Research Fellowship, and by NSF via a CAREER award and grant CNS-0540347. Email: [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAC-Bayesian Analysis of Contextual Bandits

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) N goes as

متن کامل

On Bayesian Upper Confidence Bounds for Bandit Problems

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of perf...

متن کامل

Gaussian Process bandits with adaptive discretization

In this paper, the problem of maximizing a black-box function f : X → R is studied in the Bayesian framework with a Gaussian Process (GP) prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fin...

متن کامل

Matroid Bandits: Practical Large-Scale Combinatorial Bandits

A matroid is a notion of independence that is closely related to computational efficiency in combinatorial optimization. In this work, we bring together the ideas of matroids and multiarmed bandits, and propose a new class of stochastic combinatorial bandits, matroid bandits. A key characteristic of this class is that matroid bandits can be solved both computationally and sample efficiently. We...

متن کامل

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Git...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009